AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.84)

Neural Information Processing SystemsOct-7-2024, 09:54:48 GMT

Reviews: Explaining Deep Learning Models -- A Bayesian Non-parametric Approach

I think the rebuttal is prepared very well. Although the assumption of a single component approximating the local decision boundary is quite strong, the paper nonetheless offers a good, systematic approach to interpreting black box ML systems. It is an important topic and I don't see a lot of studies in this area. Overview In an effort to improve scrutability (ability to extract generalizable insight) and explainability of a black box target learning algorithm the current paper proposes to use infinite Dirichlet mixture models with multiple elastic nets (DMM-MEN) to map the inputs to the predicted outputs. Any target model can be approximated by a non-parametric Bayesian regression mixture model.

bayesian non-parametric approach, explainability, mixture model, (12 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Pal, Samyajoy, Heumann, Christian

Variational Approach for Efficient KL Divergence Estimation in Dirichlet Mixture Models

arXiv.org Machine LearningMar-18-2024

This study tackles the efficient estimation of Kullback-Leibler (KL) Divergence in Dirichlet Mixture Models (DMM), crucial for clustering compositional data. Despite the significance of DMMs, obtaining an analytically tractable solution for KL Divergence has proven elusive. Past approaches relied on computationally demanding Monte Carlo methods, motivating our introduction of a novel variational approach. Our method offers a closed-form solution, significantly enhancing computational efficiency for swift model comparisons and robust estimation evaluations. Validation using real and simulated data showcases its superior efficiency and accuracy over traditional Monte Carlo-based methods, opening new avenues for rapid exploration of diverse DMM models and advancing statistical analyses of compositional data.

dirichlet mixture model, divergence, kl divergence, (11 more...)

2403.12158

Country: Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Afrin, Kahkashan, Iquebal, Ashif S., Karimi, Mostafa, Souris, Allyson, Lee, Se Yoon, Mallick, Bani K.

Directionally Dependent Multi-View Clustering Using Copula Model

arXiv.org Machine LearningMar-16-2020

In recent biomedical scientific problems, it is a fundamental issue to integratively cluster a set of objects from multiple sources of datasets. Such problems are mostly encountered in genomics, where data is collected from various sources, and typically represent distinct yet complementary information. Integrating these data sources for multi-source clustering is challenging due to their complex dependence structure including directional dependency. Particularly in genomics studies, it is known that there is certain directional dependence between DNA expression, DNA methylation, and RNA expression, widely called The Central Dogma. Most of the existing multi-view clustering methods either assume an independent structure or pair-wise (non-directional) dependency, thereby ignoring the directional relationship. Motivated by this, we propose a copula-based multi-view clustering model where a copula enables the model to accommodate the directional dependence existing in the datasets. We conduct a simulation experiment where the simulated datasets exhibiting inherent directional dependence: it turns out that ignoring the directional dependence negatively affects the clustering performance. As a real application, we applied our model to the breast cancer tumor samples collected from The Cancer Genome Altas (TCGA).

dataset, dependence, directional dependence, (16 more...)

2003.07494

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Xu, Hongteng, Zha, Hongyuan

A Dirichlet Mixture Model of Hawkes Processes for Event Sequence Clustering

Neural Information Processing SystemsFeb-14-2020, 07:44:03 GMT

How to cluster event sequences generated via different point processes is an interesting and important problem in statistical machine learning. To solve this problem, we propose and discuss an effective model-based clustering method based on a novel Dirichlet mixture model of a special but significant type of point processes --- Hawkes process. The proposed model generates the event sequences with different clusters from the Hawkes processes with different parameters, and uses a Dirichlet process as the prior distribution of the clusters. We prove the identifiability of our mixture model and propose an effective variational Bayesian inference algorithm to learn our model. An adaptive inner iteration allocation strategy is designed to accelerate the convergence of our algorithm.

dirichlet mixture model, event sequence clustering, hawke process, (2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

arXiv.org Machine LearningFeb-2-2020

Infinite Mixture of Inverted Dirichlet Distributions

Ma, Zhanyu, Lai, Yuping

In this work, we develop a novel Bayesian estimation method for the Dirichlet process (DP) mixture of the inverted Dirichlet distributions, which has been shown to be very flexible for modeling vectors with positive elements. The recently proposed extended variational inference (EVI) framework is adopted to derive an analytically tractable solution. The convergency of the proposed algorithm is theoretically guaranteed by introducing single lower bound approximation to the original objective function in the VI framework. In principle, the proposed model can be viewed as an infinite inverted Dirichelt mixture model (InIDMM) that allows the automatic determination of the number of mixture components from data. Therefore, the problem of pre-determining the optimal number of mixing components has been overcome. Moreover, the problems of over-fitting and under-fitting are avoided by the Bayesian estimation approach. Comparing with several recently proposed DP-related methods, the good performance and effectiveness of the proposed method have been demonstrated with both synthesized data and real data evaluations.

artificial intelligence, machine learning, mixture model, (17 more...)

1807.10693

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Xu, Hongteng, Zha, Hongyuan

A Dirichlet Mixture Model of Hawkes Processes for Event Sequence Clustering

Neural Information Processing SystemsDec-31-2017

How to cluster event sequences generated via different point processes is an interesting and important problem in statistical machine learning. To solve this problem, we propose and discuss an effective model-based clustering method based on a novel Dirichlet mixture model of a special but significant type of point processes --- Hawkes process. The proposed model generates the event sequences with different clusters from the Hawkes processes with different parameters, and uses a Dirichlet process as the prior distribution of the clusters. We prove the identifiability of our mixture model and propose an effective variational Bayesian inference algorithm to learn our model. An adaptive inner iteration allocation strategy is designed to accelerate the convergence of our algorithm. Moreover, we investigate the sample complexity and the computational complexity of our learning algorithm in depth. Experiments on both synthetic and real-world data show that the clustering method based on our model can learn structural triggering patterns hidden in asynchronous event sequences robustly and achieve superior performance on clustering purity and consistency compared to existing methods.

artificial intelligence, hawke process, machine learning, (16 more...)

Genre: Research Report (0.46)

Industry: Health & Medicine (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningApr-6-2017

DIMM-SC: A Dirichlet mixture model for clustering droplet-based single cell transcriptomic data

Sun, Zhe, Wang, Ting, Deng, Ke, Wang, Xiao-Feng, Lafyatis, Robert, Ding, Ying, Hu, Ming, Chen, Wei

Motivation: Single cell transcriptome sequencing (scRNA-Seq) has become a revolutionary tool to study cellular and molecular processes at single cell resolution. Among existing technologies, the recently developed droplet-based platform enables efficient parallel processing of thousands of single cells with direct counting of transcript copies using Unique Molecular Identifier (UMI). Despite the technology advances, statistical methods and computational tools are still lacking for analyzing droplet-based scRNA-Seq data. Particularly, model-based approaches for clustering large-scale single cell transcriptomic data are still under-explored. Methods: We developed DIMM-SC, a Dirichlet Mixture Model for clustering droplet-based Single Cell transcriptomic data. This approach explicitly models UMI count data from scRNA-Seq experiments and characterizes variations across different cell clusters via a Dirichlet mixture prior. An expectation-maximization algorithm is used for parameter inference. Results: We performed comprehensive simulations to evaluate DIMM-SC and compared it with existing clustering methods such as K-means, CellTree and Seurat. In addition, we analyzed public scRNA-Seq datasets with known cluster labels and in-house scRNA-Seq datasets from a study of systemic sclerosis with prior biological knowledge to benchmark and validate DIMM-SC. Both simulation studies and real data applications demonstrated that overall, DIMM-SC achieves substantially improved clustering accuracy and much lower clustering variability compared to other existing clustering methods. More importantly, as a model-based approach, DIMM-SC is able to quantify the clustering uncertainty for each single cell, facilitating rigorous statistical inference and biological interpretations, which are typically unavailable from existing clustering methods.

artificial intelligence, machine learning, scrna-seq data, (15 more...)